EN FR
EN FR


Section: New Results

Distributed algorithms

Participants : Luciana Arantes [correspondent] , Franck Petit, Maria Potop-Butucaru [correspondent] , Swan Dubois, Pierre Sens, Julien Sopena.

Our current research in the context of distributed algorithms focuses on two main axes. We are interested in providing fault-tolerant and self*(self-organizing, self-healing and self-stabilizing) solutions for fundamental problems in distributed computing. More precisely, we target the following basic blocks: mutual exclusion, resources allocation, agreement and communication primitives.

In dynamic systems we are interested in designing building blocks for distributed applications such as: failure detectors, adequate communication primitives (publish/subscribe) and overlays. Moreover, we are interested in solving fundamental problems such as synchronization, leader election, membership and naming, and diffusion of information.

Failure Detectors for Dynamic Systems

Since 2009, we explore a distributed computing model of dynamic networks such as (MANET or Wireless sensor networks). The temporal variations in the network topology implies that these networks can not be viewed as a static connected graph over which paths between nodes are established beforehand. Path between two nodes is in fact built over the time. Furthermore, lack of connectivity between nodes (temporal or not) makes of dynamic networks a partitionable system, i.e., a system in which nodes that do not crash or leave the system might be not capable to communicate between themselves. In 2011 we propose a new failure detector protocol which implements an eventually strong failure detectors (S) on a dynamic network with an unknown membership. Failure detector is a fundamental service, able to help in the development of fault-tolerant distributed systems. Our failure detector has the interesting feature to be time-free, so that it does not rely on timers to detect failures; moreover, it tolerates mobility of nodes and message losses [41] .

Self-* Distributed Algorithms

The main challenges of our research activity over 2011 year were to develop self-* (self-stabilizing, self-organizing and self-healing) distributed algorithms for various type of networks. Self-stabilization is a general technique to design distributed systems that can tolerate arbitrary transient faults. Since topology changes can be considered as a transient failures, self-stabilization turns out to be a good approach to deal with dynamic networks. This is particularly relevant when the distributed (self-stabilizing) protocol does not require any global parameters, like the number of nodes or the diameter of the network. With such a self-stabilizing protocol, it is not required to change global parameters in the program when nodes join or leave the system. Therefore, self-stabilization is very desirable to achieve scalability and dynamicity.

  • Snap-stabilizing Committee Coordination. The classic committee coordination problem characterizes a general type of synchronization called n-ary rendezvous as follows (In Parallel program design: a foundation, K. M. Chandy and J. Misra, Addison-Wesley Longman Publishing Co., Inc., 1988.):

    Professors in a certain university have organized themselves into committees. Each committee has an unchanging membership roster of one or more professors. From time to time a professor may decide to attend a committee meeting; it starts waiting and remains waiting until a meeting of a committee of which it is a member is started. All meetings terminate in finite time. The restrictions on convening a meeting are as follows: (1) meeting of a committee may be started only if all members of that committee are waiting, and (2) no two committees may convene simultaneously, if they have a common member. The problem is to ensure that (3) if all members of a committee are waiting, then a meeting involving some member of this committee is convened.”

    In [31] , we propose two snap-stabilizing distributed algorithms for the committee coordination problem. Snap-stabilization is a versatile technique allowing to design algorithms that efficiently tolerate transient faults. Indeed, after a finite number of such faults (e.g. memory corruptions, message losses, etc), a snap-stabilizing algorithm immediately operates correctly, without any external intervention. The first algorithm maximizes the concurrency, whereas the latter maximizes the fairness.

  • Snap-stabilizing Message Forwarding. We focus on end-to-end request and response delivery of messages that are carried over the network. This problem is also known as the message forwarding problem. It consists in the management of network resources in order to forward messages, i.e., protocols allowing messages to move from a sender to receiver over the network. Combined with an self-stabilizing routing protocol, achieving snap-stabilization for the message forwarding problem is a very desirable property because every message sent by the sender is delivered in finite time to the receiver. In other words, no message that was actually sent after the system started is lost. In [46] , we present a snap-stabilizing algorithm for the message forwarding that works on a tree topology. It uses a constant number of buffers per link, mainly 2δ+1 buffers by node, where δ is the degree of the node. Therefore, it is particularly well suited for large-scale and dynamic systems, e.g, overlays used in peer-to-peer systems.

  • Self-Organizing Swarms of Robots.

    Consider a distributed system where the computing units are weak mobile robots, i.e., devices equipped with sensors and are designed to move. By weak, we mean that the robots are anonymous, autonomous, disoriented, and oblivious, i.e., devoid of (1) any local parameter (such that an identity) allowing to differentiate any of them, (2) any central coordination mechanism or scheduler, (3) any common coordinate mechanism or common sense of direction, and (4) any way to remember any previous observation nor computation performed in any previous step. Furthermore, all the robots follow the same program (uniform or homogeneous), and there is no kind of explicit communication medium. The robots implicitly “communicate” by observing the position of the others robots. Two different environments are considered: (i) the continuous two-dimensional Euclidian space wherein robot can observe, compute and move with infinite decimal precision, and (ii) the discrete model in which the space is partitioned into a finite number of locations, conveniently represented by a graph where nodes represent locations that can be sensed, and where edges represent the possibility for a robot to move from one location to the other.

    During 2011, we mainly investigated the following problems: the gathering onto the plane, and the exploration of a finite discrete environment.

    1. Gathering. This problem can be stated as follows: Robots, initially located at various positions, gather at the same position in finite time and remain at this position thereafter. In [20] , we investigate the self-stabilizing gathering problem in the plane, that is gathering the robots deterministically with no kind of restriction on the initial configuration. In particular, robots are allowed to share same positions in the initial configuration. Strong multiplicity detection is the ability for the robots to count the exact number of robots located at a given position. We show that assuming strong multiplicity detection, it is possible to solve the self-stabilizing gathering problem with n weak robots in the semi-synchronous model if, and only if, n is odd. By contrast, we show that with an odd number of robots, the problem becomes solvable. Our proof is constructive, as we present and prove a deterministic self-stabilizing algorithm for the gathering problem.

      In [19] , we address the gathering in the discrete environment. We prove some asymptotical time and space complexity lower bounds to solve the problem. We propose an algorithm that is asymptotically optimal in both space and round complexities. Finally, we show that most of the assumptions we made are necessary to deterministically solve the rendezvous considering our initial scenario.

    2. Graph Exploration.

      The exploration problem is to visit a discrete and finite environment by a swarm of robots. We consider two types of explorations: The finite exploration and the perpetual exploration. The former requires that k robots, initially placed at different nodes, collectively explore a graph before stopping moving forever. By “collectively” explore we mean that every node is eventually visited by at least one robot. In [73] , we propose optimal (w.r.t, the number of robots) solutions for the deterministic exploration of a grid shaped network by a team of k asynchronous oblivious robots in the asynchronous and non-atomic asynchronous model. In more details, we first show that no exploration protocol exists with less than three robots for any grid with more than three nodes, less than four robots for the (2,2)-Grid, and less than five robots for the (3,3)-Grid. Next, we show that the problem is solvable using only 3 robots for any (i,j)-Grid, provided that j>3. Our result is constructive as we present a deterministic algorithm that performs in the non-atomic asynchronous model. We also present specific deterministic protocols for the (3,3)-Grid using five robots.

      The second type of exploration is the perpetual exploration. It requires every possible location to be visited by each robot infinitely often. In [32] , we investigate the exclusive perpetual exploration of grid shaped networks. We focus on the minimal number of robots that are necessary and sufficient to solve the problem in general grids. In more details, we prove that three deterministic robots are necessary and sufficient, provided that the size of the grid is nxm with 3nm or n=2 and m4. Perhaps surprisingly, and unlike results for the exploration with stop problem (where grids are "easier" to explore and stop than rings with respect to the number of robots), exclusive perpetual exploration requires as many robots in the ring as in the grid. Furthermore, we propose a classification of configurations such that the space of configurations to be checked is drastically reduced. This pre-processing lays the bases for the automated verification of our algorithm for general grids as it permits to avoid combinatorial explosion.

Combining Fault-Tolerance and Self-stabilization in Dynamic Systems

Recently, we started to investigate complex faults scenarios. Distributed fault-tolerance can mask the effect of a limited number of permanent faults, while self-stabilization provides forward recovery after an arbitrary number of transient faults hit the system. FTSS (Fault-Tolerant Self-Stabilizing) protocols combine the best of both worlds since they tolerate simultaneously transient and (permanent) crash faults. To date, deterministic FTSS solutions either consider static (i.e., fixed point) tasks, or assume synchronous scheduling of the system components. We proposed in [30] a fault-tolerant and stabilizing simulation of an atomic register. The simulation works in asynchronous message-passing systems, and allows a minority of processes to crash. The simulation stabilizes in a pragmatic manner, by reaching a long execution in which it runs correctly. A key element in the simulation is a new combinatorial construction of a bounded labeling scheme accommodating arbitrary labels, including those not generated by the scheme itself. Our simulation uses a self-stabilizing implementation of a data-link over non-FIFO channels [61] . In [23] we present the first study of deterministic FTSS solutions for dynamic tasks in asynchronous systems, considering the unison problem as a benchmark. Unison can be seen as a local clock synchronization problem as neighbors must maintain digital clocks at most one time unit away from each other, and increment their own clock value infinitely often. We present several impossibility results for this difficult problem and propose a FTSS solution (when the problem is solvable) for the state model that exhibits optimal fault containment.